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We present the first results of the PACS-CS project which aims to simulate 2+1 flavor lattice 
QCD on the physical point with the nonperturbatively 0(a)-improved Wilson quark action and the 
Iwasaki gauge action. Numerical simulations are carried out at /3 = 1.9, corresponding to the lattice 
spacing of a = 0.0907(13) fm, on a 32^ x 64 lattice with the use of the domain-decomposed HMC 
algorithm to reduce the up-down quark mass. Further algorithmic improvements make possible the 
simulation whose up-down quark mass is as light as the physical value. The resulting pseudoscalar 
meson masses range from 702 MeV down to 156 MeV, which clearly exhibit the presence of chiral 
logarithms. An analysis of the pseudoscalar meson sector with SU(3) chiral perturbation theory 
reveals that the next-to-leading order corrections are large at the physical strange quark mass. In 
order to estimate the physical up-down quark mass, we employ the SU(2) chiral analysis expanding 
the strange quark contributions analytically around the physical strange quark mass. The SU(2) 
low energy constants h and U are comparable with the recent estimates by other lattice QCD 
calculations. We determine the physical point together with the lattice spacing employing ni-^, niK 
and mn as input. The hadron spectrum extrapolated to the physical point shows an agreement 
with the experimental values at a few % level of statistical errors, albeit there remain possible 
cutoff effects. We also find that our results of f^ = 134.0(4.2) MeV, fx = 159.4(3.1) MeV and 
fK/fn = 1.189(20) where renormalization is carries out perturbatively at one loop and the errors 
are statistical only, are compatible with the experimental values. For the physical quark masses we 
obtain m^ = 2.527(47) MeV and mf^ = 72.72(78) MeV extracted from the axial-vector Ward- 
Takahashi identity with the perturbative renormalization factors. We also briefly discuss the results 
for the static quark potential. 

PACS numbers: ll.15.Ha, 12.38.-t, 12.38.Gc 



I. INTRODUCTION 

Lattice QCD is expected to be an ideal tool to under- 
stand the nonperturbative dynamics of strong interac- 
tions from first principles. In order to fulfill this promise, 
the first step should be to establish QCD as the funda- 
mental theory of the strong interaction by reproducing 
basic physical quantities, e.g., the hadron spectrum, with 
the systematic errors under control. This is about to be 
attained thanks to the recent progress of simulation algo- 
rithms and the availability of increasingly more powerful 
computational resources. 

Among various systematic errors, the two most trou- 
blesome are quenching effects and chiral extrapolation 
uncertainties. After the systematic studies on the hadron 
spectrum in quenched and two-flavor QCDfl], 0, Qi the 
CP-PACS and JLQCD collaborations performed a 2-1-1 
flavor full QCD simulation employing the nonpertur- 
batively 0(a)-improved Wilson quark actionjj] and the 
Iwasaki gaugg^ action :_5!] on a (2 fm)^ lattice at three lat- 
tice spacings[6|,LZ|. While the quenching effects were suc- 
cessfully removed, we were left with a long chiral extrapo- 
lation: the lightest up-down quark mass reached with the 
plain HMC algorithm was about 67 MeV corresponding 



to m-^/mp « 0.6. 

The PACS-CS project, which is based on the PACS- 
CS (Parallel Array Computer System for Computa- 
tional Sciences) computer with a peak speed of 14.3 
Tflops developed at University of Tsukubaj^, [3, [l(| |. 
aims at calculations on the physical point to remove 
the ambiguity of chiral extrapolations. It employs the 
same quark and gauge actions as the previous CP- 
PACS/JLQCD work, but uses a different simulation al- 
gorithm: the up-down quark mass is reduced by us- 
ing the domain-decomposed HMC (DDHMC) algorithm 
with the replay trick[ll|, [l2|- At the lightest up-down 
quark mass, which is about 3 MeV, several algorith- 
mic improvements are incorporated, including the mass- 
preconditioning [ij, [ij], the chronological inverter 15 1. 
and the deflation technique[lg|. For the strange quark 
part we improve the PHMC algorithm p^ ITsl flOJ with 
the UV-filtering procedure |20l I2H . 

So far our simulation points cover from 702 MeV to 
156 MeV for the pion mass. While we still have to re- 
duce the pion mass by 21 MeV to reach the real physical 
point, we consider that the findings so far already merits 
a detailed report. In this paper we focus on the following 
points: (i) several algorithmic inrprovements make possi- 



ble a simulation with the up-down quark mass as hght as 
the physical value, (ii) The range of pion mass we have 
simulated is sufhciently light to deserve chiral analyses 
with the chiral perturbation theory (ChPT), which re- 
veals that the strange quark mass is not small enough to 
be treated by the SU(3) ChPT up to the next-to-leading 
order (NLO). (iii) The SU(2) chiral analysis on the pion 
sector and the linear chiral extrapolation for other hadron 
masses yield the hadron spectrum at the physical point 
which is compatible with the experimental values at a 
few % level of statistical errors. 

This paper is organized as follows. In Sec.[TTlwe present 
the simulation details. Measurements of hadron masses, 
pseudoscalar meson decay constants and quark masses 
are described in Sec. lIIII In Sec. II VI we make chiral analy- 
ses on the pseudoscalar meson sector using the SU(3) and 
SU(2) ChPTs. We present the values of low energy con- 
stants and discuss convergences of the SU(3) and SU(2) 
chiral expansions. The results of hadron spectrum at the 
physical point are given in Sec. FVl together with the pseu- 
doscalar meson decay constants and the quark masses. In 
Sec.|VT]we show the results for the static quark potential. 



Our conclusions are summarized in Sec. lVIll Appendices 
are devoted to describe the algorithmic details. Prelimi- 
nary results have been reported in Refs. [22i. i23l. |24| . 



II. SIMULATION DETAILS 

A. Actions 

We employ the Iwasaki gauge actionjSll and the nonper- 
turbatively 0(a)-improved Wilson quark action as in the 
previous CP-PACS/JLQCD work. The former is com- 
posed of a plaquettc and a 1 x 2 rectangle loop: 
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with ci = -0.331 and cq = 1 - 8ci = 3.648. The latter 
is expressed as 
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where we consider the case of a degenerate up and down 
quark mass Ku = ^d- The Euclidean gamma matrices 
are defined in terms of the Minkowski matrices in the 
Bjorken-Drell convention: 7^ = —i^g£i {] — 1,2,3), 74 = 
7bD' 75 = 1%D and CT^^ = kbt^nA- The field strength 
F^i, in the clover term is given by 
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The improvement coefficient cgw for 0{a) improvement 
was determined nonpcrturbatively in Ref . |4j] . 



B. Simulation parameters 

Our simulations are carried out at /3 = 1.90 on a 
32^ X 64 lattice for which we use csw = 1.715 [1|- 



This /3 value is one of the three in the previous CP- 
PACS/JLQCD work, whereas the lattice size is enlarged 
from 20^ x 40 to investigate the baryon masses. The 
lattice spacing is found to be 0.0907(14) fm whose de- 
termination is explained later. Table |I] lists the run 
parameters of our simulations. The six combinations 
of the hopping parameters (Kud,Ks) are chosen based 
on the previous CP-PACS/JLQCD results. The heav- 
iest combination (kucIjKs) = (0.13700,0.13640) in this 
work corresponds to the lightest one in the previous CP- 
PACS/JLQCD simulations, which enable us to make a 
direct comparison of the two results with different lat- 
tice sizes. The physical point of the strange quark at 
/3 = 1.90 was estimated as k^ = 0.136412(50) in the CP- 
PACS/JLQCD work[i0]. This is the reason why all our 
simulations are carried out with Kg = 0.13640, the one 
exception being the run at (^ud, ^s) = (0.13754, 0.13660) 
to investigate the strange quark mass dependence. After 
more than 1000 MD time for thermalization we calculate 
hadronic observables solving quark propagators at every 
10 trajectories for n^d > 0.13770 and 20 trajectories for 
'^ud = 0.13781, while we measure the plaquettc expecta- 
tion value at every trajectory. 



C. Algorithm 

Our base algorithm for penetrating into the sniaU mass 
region for a degenerate pair of up and down quarks is the 
DDHMC algorithm [Tij. The effectiveness of this algo- 
rithm for reducing the quark mass was already shown in 
the Nf = 2 case|lll[25l.[26|. We found that it works down 
to Kiid = 0.13770 (or m^ « 300 MeV) on our 32^ x 64 
lattice. Moving closer to the physical point, however, we 
found it necessary to add further enhancements including 
mass preconditioning, which we call mass-preconditioned 
DDHMC (MPDDHMC). This is the algorithm we apphed 
at our hghtest point at K^d = 0.13781. 

The characteristic feature of the DDHMC algorithm 
is a geometric separation of the up-down quark determi- 
nant into the UV and the IR parts, which is implemented 
by domain-decomposing the full lattice into small blocks. 
We choose 8^ for the block size, being less than (1 fm)"* 
in physical units and small enough to reside within a 
computing node of the PACS-CS computer. The latter 
feature is computationally advantageous since the calcu- 
lation of the UV part requires no communication between 
blocks so that the inter-node communications are sizably 
reduced. 

The UV/IR separation enables the application of mul- 
tiple time scale integration schemes [23], which reduces 
the simulation cost substantially. In our simulation 
points we find that the relative magnitudes of the force 
terms are 
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where we adopt the convention ||A/|p = — 2tr(M^) for 
the norm of an element M of the SU(3) Lie algebra, 
and Fg denotes the gauge part and Fuv.ir, are for the 
UV and the IR parts of the up-down quarks. The as- 
sociated step sizes for the forces are controlled by three 
integers A'^o^i_2 introduced by Srg = T/N0N1N2, Stuv = 
T/N1N2, (5tir = r/A^2 with r the trajectory length. The 
integers A'0,1.2 should be chosen such that 
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The relative magnitudes between the forces in Eq. ([8]) tell 
us that (5riR may be chosen roughly 16 times as large as 
dTg and 4 times that of (5tuv , which means that we need 
to calculate Fir an order of magnitude less frequently in 
the molecular dynamics trajectories. Since the calcula- 
tion of Fir contains the quark matrix inversion on the 
full lattice, which is the most computer time consuming 
part, this integration scheme saves the simulation cost 
remarkably. 

The values for iVo,i,2 are listed in TableHl where A^o and 
Ni are fixed at 4 for all the hopping parameters, while the 
value of A''2 is adjusted taking account of acceptance rate 
and simulation stability. The threshold for the replay 
trick[ll|, [12,] for dealing with instabilities of molecular 
dynamics trajectories leading to large values of dH is set 
to be AH > 2. 



For the strange quark, we employ the UV-filtered 
PHMC (UVPHMC) algorithm^. The UVPHMC ac- 
tion for the strange quark is obtained through the UV- 
filtering[20| applied after the even-odd site precondition- 
ing for the quark matrix. The domain-decomposition is 
not used. The polynomial approximation is corrected by 
the global Metropolis test [2^|. Since we find ||Fs|| « 
||Fir||, the step size is chosen as Sts = Stib,- The polyno- 
mial order for UVPHMC, which is denoted by iVpoiy in 
Table m is adjusted to yield high acceptance rate for the 
global Metropolis test at the end of each trajectory. 

The inversion of the Wilson-Dirac operator D on 
the full lattice is carried out by the SAP (Schwarz al- 
ternating procedure) preconditioned GCR solver. The 
preconditioning is accelerated with the single-precision 
arithmetic [29J. We employ the stopping condition \Dx — 
b\/\b\ < 10~^ for the force calculation and 10""'^'* for 
the Hamiltonian, which guarantees the reversibility of 
the molecular dynamics trajectories to a high precision: 
\AU\ < 10-12 for the hnk variables and \AH\ < 10"^ for 
the Hamihonian at {Knd,Ks) = (0.13781,0.13640). We 
describe the details of the DDHMC algorithm and the 
solver implementation used for n^d < 0.13770 in Ap- 
pendix 1^ 

As we reduce the up-down quark mass, we observe a 
tendency that the fluctuation of | IFir 1 1 during the molec- 
ular dynamics trajectory increases, which results in a 
higher replay rate due to the appearance of trajecto- 
ries with large AH. Since AH is controlled by the 
product of ^TiR and ||Fir||, a possible solution to sup- 
press the replay rate is to reduce ^tir. In this case, 
however, we find the acceptance becoming unnecessar- 
ily close to unity. Another solution would be to tame 
the fluctuation of ||Fir||, and we employ for this pur- 
pose the mass-Drcconditioner|13l [l4| to the IR part of the 
pseudofermion action. The quark mass in the precondi- 
tioner is controlled by an additional hopping parameter 
kJjjj = pKud, where p should be less than unity so that cal- 
culating with the preconditioner is less costly than with 
the original IR part. The IR force Fir is split into F{^ 
and Fir. The former is derived from the preconditioner 
and the latter from the preconditioned action. 

We employ the mass-preconditioned DDHMC 
(MPDDHMC) algorithm for the run at the lightest 
up-down quark mass of n^d — 0.13781. With our choice 
oi p — 0.9995 the relative magnitudes of the force terms 
become 
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According to this result we choose {Na,Ni,N2,N3) — 
(4, 4, 4, 6) for the associated step sizes. Here the choice 
of A''2 —4 does not follow the criterion i5rjp;||Fj'j^|| « 
(^tirIIFirII. This is because we take account of the fluc- 
tuations of ||Fir||. The replay trick is not implemented 
in the runs at n^d — 0.13781. For the step size for 
the strange quark in the UVPHMC algorithm we choose 



Sts = StIyi as we observe 1 1 Fg 
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The inversion of D during the molecular dynamics 
steps is also improved at K^d — 0.13781 in three ways, (i) 
We employ the chronological guess using the last 16 so- 
lutions to construct the initial solution vector oi D^^ on 
the full lattice [i^. In order to assure the reversibility we 
apply a stringent stopping condition |Dx — 6|/|6| < 10~^^ 
to the force calculation, (ii) The inversion algorithm is re- 
placed by a nested BiCGStab solver, which consists of an 
inner solver accelerated with single precision arithmetic 
and with an automatic tolerance control ranging from 
10"'^ to 10~^, and an outer solver with a stringent tol- 
erance of lO"^** operated with the double precision. The 
approximate solution obtained by the inner solver works 
as a preconditioner for the outer solver, (iii) We imple- 
ment the deflation technique to make the solver robust 
against possible small eigenvalues allowed in the Wilson- 
type quark action. Once the inner BiCGStab solver be- 
comes stagnant during the inversion of D, it is automati- 
cally replaced by the GCRO-DR (Generalized Conjugate 
Residual with implicit inner Orthogonalization and De- 
flated Restarting) algorithm |16|. In our experience the 
GCRO-DR algorithm is important for calculating D~^ 
but does not save the simulation time at n^d = 0.13781. 
More details of the MPDDHMC algorithm and the im- 
provements are given in Appendix |Bl 



D. Implementation on the PACS-CS computer 

All of the simulations reported in this article have 
been carried out on the PACS-CS parallel computer (lOJ. 



PACS-CS consists of 2560 nodes, each node equipped 
with a 2.8GIIz Intel Xeon single-core processor {i.e., 
5.6Gflops of peak speed) with 2 GBytes of main mem- 
ory. The nodes are arranged into a 16 x 16 x 10 array 
and connected by a 3-dimcnsional hypercrossbar network 
made of a dual Gigabit Ethernet in each direction. The 
network bandwidth is 750 MBytes/sec for each node. 

The programming language is mainly Fortran 90 with 
Intel Fortran compiler. To further enhance the perfor- 
mance we used Intel C-\ — h compiler for the single preci- 
sion hopping matrix multiplication routines which are the 
most time consuming parts. The Intel compiler enables 
us to use the Intel Streaming SIMD extensions 2 and 3 
intrinsics directly without writing assembler language. 

We employ a 256 node partition of PACS-CS to execute 
our 32'^ X 64 runs. The sustained performance including 
communication overhead with our DDHMC code turns 
out to be 18%. The computer time needed for one MD 
unit is listed in Table |T1 



E. Efficiency of DDHMC algorithms 

The efficiency of the DDHMC algorithm may be clari- 
fied in comparison with that of the HMC algorithm. For 
Nf — 2 QCD simulations with the Wilson-clover quark 
action, an empirical cost formula suggested for the HMC 
algorithm based on the CP-PACS and JLQCD TV/ = 2 
runs was as follows ISOll: 
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with C w 2.8. A strong quark mass dependence in the 
above formula l/{mT^/mp)^ ^ l/"^ud stems from three 
factors: (i) the number of iterations for the quark ma- 
trix inversion increases as the condition number which 
is proportional to 1/m^d, (ii)to keep the acceptance rate 
constant we should take St cx m^d for the step size in the 
molecular dynamics trajectories, and (iii) the autocorre- 
lation time of the HMC evolution was consistent with an 
l/wud dependence in the CP-PACS runsQ. 

To estimate the computational cost for Nf ==2 + 1 
QCD simulations with the HMC algorithm, we assume 
that the strange quark contribution is given by half of 
Eq. (jlip at rfiT^lnip = 0.67 which is a phenomenologically 
estimated ratio of the strange pseudoscalar meson "r7i^^^" 
and m^: 
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0.67. 



Since the strange quark is relatively heavy, its computa- 
tional cost occupies only a small fraction as the up-down 
quark mass decreases. In Fig. [1] we draw the cost formula 
for the Nf = 2-t-l case as a function of rriTr/rnp, where we 
take #conf=100, a=0.1 fm and L = 3 fm in Eq. (fTTj) as 
a representative case. We observe a steep increase of the 
computational cost below m-j; /nip ~ 0.5. At the physical 
point the cost expected from Eq. pTjl would be O(IOO) 
Tflops- years. 

Let us now see the situation with the DDHMC al- 
gorithm. The blue open symbol in Fig. [T] denotes the 
measured cost at (^udj^s) = (0.13770,0.13640), which 
is the lightest point implemented with the DDHMC al- 
gorithm. Here we assume that we need 100 MD time 
separation between independent configurations. We ob- 
serve a remarkable reduction in the cost by a factor 
20 — 30 in magnitude. The majority of this reduction 
arises from the multiple time scale integration scheme 



and the GCR solver accelerated by the SAP precondition- 
ing with the single-precision arithmetic. Roughly speak- 
ing, the improvement factor is 0(10) for the former and 
3 - 4 for the latter. The cost of the MPDDHMC al- 
gorithm at (Kud,Ks) = (0.13781,0.13640) is plotted by 
the blue closed symbol in Fig. [1] In this case, the re- 
duction is mainly owing to the multiple time scale inte- 
gration scheme armored with the mass-preconditioning 
and the chronological inverter for F^^ and Fir. As we 
already noted, the GCRO-DR solver does not accelerate 
the inversion albeit it renders the solver robust against 
the small eigenvalues of the Wilson-Dirac operator. 

Since we find in Table [l] that Tint [P] is roughly inde- 
pendent of the up-down quark mass employed in the 
DDHMC algorithm, the cost is expected to be propor- 
tional to I/ti^j. Assuming this quark mass dependence 
for the MPDDHMC algorithm, we find that simulations 
at the physical point is feasible, at least for L « 3 fm 
lattices, with 0(10) Tflops computers, which are already 
available at present. 



F. Autocorrelations and statistical error analysis 

The autocorrelation function T{t) of a time series of 
an observable O in the course of a numerical simulation 
is given by 



T{T) = {0{To)0{ro+T))-{0{ro))' 



(12) 



In Fig. [2] we show the plaquette history and the nor- 
malized autocorrelation function p(r) = r(r)/r(0) at 
i^ud = 0.13727 as an example. The integrated autocorre- 
lation time is estimated as rint[F] — 20.9(10.2) following 
the definition in Ref. [IM 
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where the summation window W is set to the first time 
lag r such that p{t) becomes consistent with zero within 
the error bar. In this case we find W = 119.5. The choice 
of W is not critical for estimate of Tint in spite of the 
long tail observed in Fig. [2l Extending the summation 
window, we find that Tint[F] saturates at Tint[f] « 25 
beyond W = 200, which is within the error bar of the 
original estimate. 

Our simulations at K^d = 0.13700 and 0.13727 are fast 
enough to be executed by a single long run of 2000 MD 
units. The simulations at K^d ^ 0.13754 become increas- 
ingly CPU time consuming so that we had to execute 
multiple runs in parallel . The data obtained from differ- 
ent runs are combined into a single extended series, for 
which we define the above autocorrelation function T(t) 
as if it were a single run. The results for Tint [P] are listed 
in Table HI Although we hardly observe any systematic 
quark mass dependence for the integrated autocorrela- 
tion time, the statistics may not be sufficiently large to 
derive a definite conclusion. 



In the physics analysis we estimate the statistical errors 
with the jackknife method in order to take account of the 
autocorrelation. For the simulations at Kud > 0.13754 
we apply the jackknife analysis after combining the dif- 
ferent runs into a single series. The bin size dependence 
of the statistical error is investigated for each physical 
observable. For a cross-check we also carry out the boot- 
strap error estimation with 1000 samples. In all cases we 
find the two estimates agree for the magnitude of errors 
within 10%. We follow the procedure given in Appendix 
B of Ref. i2] in estimating the errors for the chiral fit 
parameters. 



III. MEASUREMENTS OF HADRONIC 
OBSERVABLES 

A. Hadron masses, quark masses and decay 
constants 

We measure the meson and baryon correlators at the 
unitary points where the valence quark masses are equal 
to the sea quark masses. For the meson operators we 
employ 
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M/^(x) = 9/(x)rg,(x), 



(14) 



where / and g denote quark flavors and F are 16 Dirac 

matrices F = I, 75, 7^, 17^75 and i[7^,7^]/2 (^, j/ = 
1, 2, 3, 4). The octet baryon operators are given by 

Ofj'^ix) = e^''^iiq'}ix)fCj,q'^ix))qU^), (15) 

where a, b, c are color indices, C = 7472 is the charge 
conjugation matrix and a = 1, 2 labels the z-component 
of the spin 1/2. The S- and A-like octet baryons are 
distinguished by the flavor structures: 
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where Ol-'^l'* = O^^h „ (jgfh^ ^g ^^^^^ ^^le decuplet 
baryon operators for the four z-components of the spin 
3/2 as 
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^{{q^j{x))^CT+q]{xM,^{x)]/i, (19) 
Di\)^{x) = e'^'%{q'}{x)rCT,ql{x))qUx) 

-{{q'}{x)fCV^q]{x))ql,{x)]/i, (20) 

Dlf,^{x) = e-'\{q';{x)rCV^ql{x))ql^{x), (21) 

where T± — (7i=P72)/2, Fq — 73 and the flavor structures 
should be symmetrized. 



We calculate the meson and the baryon correlators 
with point and smeared sources and a local sink. For the 
smeared source we employ an exponential smearing func- 
tion ^(|x|) = Aqexp{-Bq\x\) {q = ud,s) with *(0) = 1 
for the ud and s quark propagators. The parameters Aq 
and Bq are adjusted from a couple of configurations after 
the beginning of the production run such that the pseu- 
doscalar meson effective masses reach a plateau as soon 
as possible. Their values are given in Table HIl The point 
and smeared sources allow the hadron propagators with 
nonzero spatial momentum, and we calculate them for 
p= (0, 0, 0), (vr/ie, 0, 0), (0, 7r/16, 0), (0, 0, 7r/16). 

In order to increase the statistics we calculate 
the hadron correlators with four source points at 
{xo,yo,zo,toMn,17,17,l), (1,1,1,9), (25,25,25,17), 
and (9,9,9,25) for Kud > 0.13754. They are aver- 
aged on each configuration before the jackknife analysis. 
This procedure reduces the statistical errors by typically 
20 — 40% for the vector meson and the baryon masses 
and less than 20% for the pseudoscalar meson masses 
compared to a single source point. For further enhance- 
ment of the signal we average zero momentum hadron 
propagators over possible spin states on each configura- 
tion: three polarization states for the vector meson and 
two (four) spin states for the octet (decuplet) baryons. 

We extract the meson and the baryon masses from 
the hadron propagators with the point sink and the 
smeared source, where all the valence quark propaga- 
tors in the mesons and the baryons have the smeared 
sources. Figures [SHG] show effective mass plots for the 
meson and the baryon propagators with the smeared 
source for Kud > 0.13754. We observe that the excited 
state contributions are effectively suppressed and good 
plateaus start at small values of t. 

The hadron masses are extracted by uncorrelated x^ 
fits to the propagators without taking account of corre- 
lations between different time slices, since we encounter 
instabilities for correlated fits using covariance matrix. 
We assume a single hyperbolic cosine function for the 
mesons and a single exponential form for the baryons. 
The lower end of the fit range tmin is determined by in- 
vestigating stability of the fitted mass. On the other 
hand, the choice of imax gives little influence on the fit 
results as far as the effective mass exhibits a plateau and 
the signal is not lost in the noise. We employ the same fit 
range [imin,imax] for the same particle type: [13,30] for 
pseudoscalar mesons, [10, 20] for vector mesons, [10, 20] 
for octet baryons and [8, 20] for decuplet baryons. These 
fit ranges are independent of the quark masses. Resulting 
hadron masses are summarized in Table IIIII 

Statistical errors are estimated with the jackknife pro- 
cedure. In Fig. [7] we show the bin size dependence of 
the error for mT^ and ?7i^^^ . We observe that the magni- 
tude of error reaches a plateau after 100—200 MD time 
depending on the quark mass. Since similar binsize de- 
pendences are found for other particle types, we em- 
ploy a binsize of 250 MD time for the jackknife analy- 
sis at 0.13770 > Kud > 0.13754. At our lightest point 



Kud = 0.13781 with the statistics of 990 MD units, we 
had to reduce the bin size to 110 MD units. 

We define the bare quark mass based on the axial vec- 
tor Ward-Takahashi identity (AWI) by the ratio of matrix 
elements of the pseudoscalar density P and the fourth 
component of the axial vector current A4 : 



■AWI 



.AWI 



{0\P\PS) 



(22) 



where |PS) denotes the pseudoscalar meson state at rest 
and / and g {f,g — ud, s) label the flavors of the va- 
lence quarks. We employ the nonperturbatively 0(a)- 
improved axial vector current A™^ = A4 + Cj^V4P 
with V4 the symmetric lattice derivative, and ca = 
—0.03876106 as determined in Ref. [3l|. In practice the 
AWI quark mass is determined by 
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(23) 



where to-ps, C^ and Cp are obtained by applying a si- 
multaneous x^ fit to 



(^rp(t)p^(o)) = 2C: 



sinh(-TOps(t-r/2)) 



exp(mpsr/2) 
with a smeared source and 

cosh(-TOPs(i-T/2)) 



(P(i)P"(0)> = 2C'p- 



exp(TOpsT/2) 



(24) 



(25) 



with a smeared source, where T denotes the tempo- 
ral extent of the lattice. We employ the flt range of 
[^minjimax] = [13,25] for the former and [13,30] for the 
latter at all the hopping parameters. The renormalized 
quark mass in the continuum MS scheme is defined as 
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The renormalization factors Za.p and the improvement 
coefficients Ba.p are perturbatively evaluated up to one- 
loop level 32, 33, 34] with the tadpole improvement. The 
VWI quark masses in the ma corrections are perturba- 
tively obtained from the AWI quark masses: 
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In Table IIVI we list the values of ' 



,MS 
'ud 



and 
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(28) 



renor- 



malized at the scale of 1/a, whose statistical errors are 
provided by the jackknife analysis with the bin size cho- 
sen as in the hadron mass measurements. 



The bare pseudoscalar meson decay constant on the 
lattice is defined by 



(o|Afp|ps) =f^rmps- 



(29) 



with |PS) the pseudoscalar meson state at rest consisting 
of / and g valence quarks. We evaluate fpg^° from the 
formula 



TOtt are consistent within the errors, we find a 1 — 2% 
deviation for nip and ttin. This is pictorially confirmed 
in Fig.[8]which shows the effective masses for the n meson 
and the nucleon. The pion effective masses are almost 
degenerate from t = 8 to 17, while a slight discrepancy 
is observed for the nucleon results. The p and nucleon 
masses may be suffering from finite size effects. 



xbarc 
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(30) 



IV. CHIRAL ANALYSIS ON PSEUDOSCALAR 

MESON MASSES AND DECAY CONSTANTS 



where we extract mpg, C\ , Cp and C^p from a simulta- 
neous fit of Eqs. dUl), dig) and 



(P(0P'(0))^2Ci. -'^-7^^V/27'^^ (31) 

exp(mpsT/2) 

with a local source. The fit ranges are [13, 25] , [13, 30] and 
[15,25], respectively, at all the hopping parameters. The 
bare decay constant /p"'' is renormalized perturbatively 
with 

/ps = u^Za ( 1 + bA * ^^^ ' J /^r, (32) 

where rni^^ is estimated by Eq. (p8)) . Table |TV] summa- 
rizes the results for /ps with the statistical errors evalu- 
ated by the jackknife analysis with the bin size chosen as 
in the hadron mass measurements. 



The analysis of chiral behavior of pseudoscalar meson 
masses and decay constants occupy an important place 
in lattice QCD. Theoretically the main points to exam- 
ine are the presence of chiral logarithms as predicted by 
ChPT and the convergence of the ChPT series itself. The 
viability of ChPT is relevant also for studies of finite-size 
effects. The low energy constants are important from 
phenomenological points of view. And finally, the chi- 
ral analysis is required to pin down the physical point in 
the parameter space of the simulations. We begin with 
a discussion of a subtle point in the chiral analysis when 
Wilson-clover quark action with implicit chiral symmetry 
breaking is employed. 



Chiral perturbation theory for O(a)-improved 
Wilson-type quark action 



B. Comparison with the previous 
CP-PACS/JLQCD results 

A comparison between the present PACS-CS results 
and those with the previous CP-PACS/JLQCD work[Q, 
0] obtained with the same gauge and quark actions is 
possible at (Kud,Ks) — (0.13700,0.13640), except that 
the lattice sizes are different: 32"^ x 64 for the former and 
20^ X 40 for the latter. In Table |V] we list the PACS-CS 
and the CP-PACS/JLQCD results for the hadron masses 
at (Kud,Ks) = (0.13700,0.13640). While the results for 



Our simulations are carried out with a non- 
perturbatively 0(a)-improved Wilson quark action. At 
present we correct 0{ma) terms in the AWI quark masses 
and decay constants by one-loop perturbation theory. 
These corrections are expected to be very small in mag- 
nitude, and hence leading scaling violations in meson 
masses and decay constants from our simulations can be 
taken as 0{a^). In this case, the NLO formula of Wilson 
chiral perturbation theory for the SU(3) fiavor case[3q], 
which incorporates the leading contributions of the im- 
plicit chiral symmetry breaking effects of the Wilson- type 
quarks, are given by 
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(33) 
(34) 
(35) 
(36) 



where the quark masses are defined by the axial-vector 
Ward-Takahashi identities: rriud = '™ud^' s-^d "^s — 
mf^^. ^4,5,6,8 are the low energy constants and /ups 
is the chiral logarithm defined by 



where 
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Indeed re-expansion of the terms in the curly brackets 
of (l33ll to ((36|l gives rise only to terms of form 0{m,q ■ 



a) and 0(mglnTOg 



a ), which are NNLO in the order 



counting of WChPT analysis and hence can be ignored. 
Thus, up to NLO, WChPT formula are equivalent to the 
continuum form. Note that the expressions in terms of 
the VWI quark masses take different forms and cannot 
be reduced to those of the continuum ChPT. Hereafter 
we concentrate on the continuum ChPT. 



with /i the renormalization scale. 

The two additional parameters H" and H' are asso- 
ciated with the 0{a?) contributions distinguishing the 
Wilson ChPT from that in the continuum. Since these 
parameters are independent of the quark masses, their 
contributions can be absorbed into Bq and /o by the fol- 
lowing redefinitions: 
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B. SU(3) chiral perturbation theory 



The SU(3) ChPT formula in the continuum up to 
NLO [36] is given by 
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r 



There are six unknown low energy constants 
Bo, /o, ^4,5,6,8 in the expressions above. £4,5,6,8 
are scale-dependent so as to cancel that of the chiral 
logarithm given by (p7|) . We can determine these pa- 
rameters by applying a simultaneous fit to m'^/(2mud)i 
mj^/irriud + Ws), U and fx- 

In order to provide an overview of our data we plot 
in Fig. [9] a comparison of the PACS-CS (red symbols) 
and the CP-PACS/JLQCD results (black symbols) for 
ml/m^J^^ and /k/U as a function of m^wi. The 
two data sets show a smooth connection at K^d = 
0.13700 (TOu7^ = 0.028). More important is the fact 
that an almost linear quark mass dependence of the 
CP-PACS/JLQCD results in heavier quark mass region 
changes into a convex behavior, both for m^/m^^ and 



Ik/ It: 



,AWI 



is diminished in the PACS-CS results. 



This is a characteristic feature expected from the ChPT 
prediction in the small quark mass region due to the chi- 
ral logarithm. This curvature drives up the ratio JkI J-k 
toward the experimental value as the physical point is 
approached. 

Having confirmed signals for the presence of the chiral 
logarithm, we apply the SU(3) ChPT formulae (|43l) - (lie)) 
to our results. We choose the four simulation points 
at Kud > 0.13754. In Fig. [5] these four points lie to 
the left and around the turning point of the curvature. 
They also correspond to the region where the p meson 
mass satisfies the condition m,p > 2171-^, and hence lie 
to the left of the threshold singularity in the complex 
energy plane for the p meson. The heaviest pion mass 



at (Kud,Ks) = (0.13754,0.13640) is about 410 MeV with 
the use of a^^ = 2.176(31) GeV determined below. The 
measured bare AWI quark masses, but corrected for the 
0{ma) corrections at one-loop perturbation theory, are 
used for mud and ms in Eqs. (|43)l — (|46)l . 

We present the fit results for the low energy con- 
stants in Table IVII The results are quoted both with- 
out (w/o FSE) and with (w/ FSE) finite-size corrections 



in the ChPT formulae (see Sec lIVPp . We also list the 
phenomenological estimates with the experimental in- 
puts [33, [3a|, and the results obtained by recent 2-1-1 
fiavor lattice QCD calculations [39|, |40|. The renormal- 
ization scale is set to be 770 MeV. The MILC results for 
the low energy constants quoted at the scale of m^ are 
converted according to Ref. |36j 



J 






1 



L4(/i) = L4(to„) — -r^rr — ^ In 

{2U-U)(p) = {2Ls - Lr,){m^) + 



2567r2 

3 

2567r2 

(2Le - L4)(mr,) - 



ln(4 
m^ 



1 



9/ 2567r2 
4\ 1 



3/ 2567r2 



(47) 
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with /i the renormalization scale. For L^ and L^ govern- 
ing the behavior of /^r and fx , we find that all the results 
are compatible. On the other hand, some discrepancies 
are observed for the results of 2Lq — L4 and 2Ls — L5 



contained in the ChPT formulae for m'i and m^ 



K- 

For later convenience we convert the SU(3) low energy 
constants -Bo, /o, ^4,5,6,8 to the SU(2) low energy con- 
stants B, /, ^3^4 defined by 
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with 777^ = iriudB. The NLO relations are given bv|36 
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and li (i = 3, 4) are defined at the renormalization scale 
fi^m^ ^ 139.6 MeV jHI: 



k = :d^[h+^^^] (61) 
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In Table rVTll we summarize the results for the SU(2) low 
energy constants obtained by the conversion from the 
SU(3) low energy constants. The vacuum condensations 
are defined by 

{uu)a = {uu)\,n^^=rn,=Q = --f^Bo, (64) 



'rririd—O.ms—m. 



p„..o.i = --/^B. (65) 



These quantities are perturbatively renormalized at the 
scale of 2 GeV. 

In Fig. [TO] we compare our results for 13^4 with those ob- 
tained by other groups whose numerical values are listed 
in Table IVIIII Black symbols denote the phenomeno- 
logical estimates, blue symbols represent the results ob- 
tained by the SU(2) ChPT fit on 2 flavor dynamical con- 
figurations and red closed (open) symbols are for those 
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obtained by the SU(3) (SU(2)) ChPT fit on 2+1 flavor 
dynamical configurations. For l^ all the results reside 
between 3.0 and 3.5, except for the MILC result which 
is sizably smaller and marginally consistent with others 
within a large error. On the other hand, we find a good 
consistency among the results for U. 

We have found that the SU(3) ChPT fit gives reason- 
able values for the low energy constants. However, we are 
concerned with a rather large value of x^/dof=4.2(2.9) 
(see Table IVI[) . Figures [TT] and [T^ show how well the 
data for rn^/rriud, 2m^/(mud + "t-s)j /tt and Jk are de- 
scribed by the SU(3) ChPT up to NLO. The filled and 
open circles are our data, and the fit results are plot- 
ted by blue triangles. We note in passing that , for the 
Wilson-clover quark action, mf^^ varies at 0{a^) as niud 
varies even if Kg is held fixed. Thus we are not able to 
draw a line with a fixed value for m^^^. The blue star 
symbols represent the extrapolated values at the physi- 
cal point whose determination will be explained below in 
Sec. El 

The points around in^"^^ ~ 0.01 corresponds to 
{Kud,Hs) = (0.13754,0.13640) and (0.13754,0.13660). 
Marked deviations between circles and triangles show 
that the SU(3) ChPT poorly accounts for the strange 
quark mass dependence of f.^ and /k ■ This flaw is mainly 
responsible for the large value of x^/dof. 

In order to investigate the origin of discrepancy be- 
tween the data and the fit more closely, we draw the 
relative magnitude of the NLO contribution to the LO 
one for rn^/rriud, 2?Ti|./(77iud + "T^s), /tt and /k as a func- 
tion of rn^^^ in Figs. [T3] and [TH The strange quark 
mass is fixed at the physical value, and the contribu- 
tions from TT, K and rj loops are separately drawn. The 
relative magnitudes are at most 10% for mj/mud and 
27TT,x/("^ud + JTis)- We find, however, significant NLO 
contributions for the decay constants. For f^ the rela- 
tive magnitude rapidly increases from 10% at rriud = 
to 40% at around ?Tiud = 0.01. The situation is worse for 
/k for which the NLO contribution is about 40% of the 
LO one even at ?nud = 0, most of which arises from the 
K loop. 



C. SU(2) chiral perturbation theory 

The bad convergences of the chiral expansions for /^ 
and /k tell us that the strange quark mass is not light 
enough to be appropriately treated by the NLO SU(3) 
ChPT. There are two alternative choices for further chiral 
analysis. One is to extend SU(3) ChPT to NNLO, and 
the other is to use SU(2) ChPT with the aid of an analytic 
expansion for the strange quark contribution around the 
physical strange quark mass. 

The former method, which has been employed by the 
MILC collaboration in an incomplete fashion 40], is very 
demanding: we cannot determine the additional low en- 
ergy constants at NNLO without significantly increas- 
ing the data points. There is in addition no guar- 



antee that the expansion is controlled at NNLO. We 
therefore consider that the latter route is more natu- 
ral. This alternative was employed by the RBC/UKQCD 
collaborationist- Since they had data only at a single 
strange quark mass, they could not study the strange 
quark mass dependence. This we shall do with our data 
thanks to the second choice of the strange quark mass at 
Kud = 0.13754. 

For m^ and /^ the SU(2) ChPT formulae of ^ and 
([5^ are employed. The low energy constants B and / are 
functions of the strange quark mass. Assuming that we 
run simulations close enough around the physical point 
for the strange quark mass so that a linear expansion in 



771s is sufficient, we write B 

p(0) 



B 



(0) 



fUs Bs and / 



/s"^ + TTisfs , where it should be noted that Bg ^ Bq 
and /i*'^ ^ /o. 

For the kaon sector we treat the K mesons as matter 
fields in the isospin 1/2 linear representation, and couple 
pions in SU(2) invariant ways (see, e.g., Ref. [43|). For 
rriK and /k this leads to the following fit formulae: 
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with f — fs + iTT-s fs . In these formulae, the linear 
expansion in 777s should be regarded as that around the 
physical strange quark mass. 

We apply a simultaneous fit to 7777r, fn and fx employ- 
ing the formulae of Eqs. (HI]), ^^ and ^T^j. The kaon 



mass mj^ is independently fitted according to Eq. ((5^ . 
Calling the four data points corresponding_ to Kud > 
0.13754 as Range I, the fit results for B,f, 1^,14 at the 
physical strange quark mass are presented in Table IIXI 
and Fig.[Tn]both without and with finite-size corrections. 
We find that they are consistent with those obtained by 
the NLO conversion from the SU(3) low energy constants 
given in Table IVlIl Although our result for {uu) is about 
50% smaller than that of RBC/UKQCD, the difference 
comes from estimates of the renormalization factor: we 
use one-loop perturbation while they employ the non- 
perturbative RI-MOM scheme. This is verified by the 
observation that the value of / and the renormalization- 
free quantities m^dB and rngB show consistency between 
our results and those of RBC/UKQCD. 

Figures [T5] and [12] show that the quark mass depen- 
dences of 777^/777^^^^ /^ and Jk arc reasonably described 
by the SU(2) ChPT formulae of dUD , (USD and dS?]) . The 
resulting x^/dof is 0.33(72), which is an order of mag- 
nitude smaller than in the SU(3) case. In Fig. [T7] we il- 
lustrate the relative magnitude of the NLO contribution 
against the LO value for m^/m-ad, fw and Jk as a func- 
tion of 777^^^ fixing the strange quark mass at the phys- 
ical value. The convergences for /^ and Jk are clearly 
better than the SU(3) case. 

In order to investigate the stability of the fit, we try two 
additional choices of the data sets for the SU(2) ChPT fit: 
Range II (Kud,Ks)==(0. 13781,0. 13640), (0.13770,0.13640), 
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(0.13754,0. 13640), (0.13754,0. 13660), (0.13727,0.13640) 
includes one more data at a heavier pion mass added 
to Range I, and Range III (Kud,Ks)=(0. 13770,0. 13640), 
(0.13754,0. 13640), (0.13754,0. 13660), (0.13727,0.13640) 
removes the point with the hghtest pion mass from 
Range II. The results for B,f,l3,U and corresponding 
X^/dof are given in Table IIXI While inclusion of the 
data at Kud = 0.13727 increases the value of x^/dof, 
the results for B, f, 1^,14 are consistent among the three 
cases within the error bars. 



D. Finite size effects based on chiral perturbation 
theory 

We evaluate finite-size effects based on the NLO for- 
mulae of ChPT. In the case of SU(3) ChPT the finite 
size effects defined by Rx — iX(L) — X((X)))/X(oo) for 
X = m^,mK, fn, Ik are given by [sj: 
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where Ki is the Bessel function of the second kind 
and m[n) denotes the multiplicity of the partition n — 
TV^ + riy + n1. The authors in Ref . [33 expect that the 
above formulea are valid for TTi^ri > 2, in which our sim- 
ulation points reside. In Figs. [TT] and [T^ we also plotted 
the ChPT fit results including finite size effects. The 
results are almost degenerate with the fit results with- 
out finite size effects except at the lightest simulation 
point at Kud — 0.13781 and the extrapolated values at 
the physical point. This feature is understood by look- 
ing at Fig. [TH] where we plot the magnitude of Rx for 
X = mT^jTriK, /tt, fx with L = 2.9 fm as a function of 
m^ keeping the strange quark mass fixed at the physical 
value. The expected finite size effects are less than 2% 
for Tops and /ps at our simulation points. For mps this 
is true even at the physical point, while the value of /^ 
is decreased by 4% due to the finite size effects. 

We can repeat the above study for the SU(2) case. The 



NLO formulae for tUt^ and /tt are given bvJ37 
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In Figs . [T51 and \W\ we hardly detect finite size effects for 
TO^^i > 0.001. Figure [m shows R'^ for X = to^, /^ with 
L = 2.9 fm as a function of TOtt. The situation is similar 
to the SU(3) case: although finite size effects increase as 
TO,^ decreases, their magnitudes arc at most 2% for to,^ 
and 4% for /^ even at the physical point, which is easily 
expected by comparing the expressions of R and R' . 

Let us add a cautionary note that the finite-size for- 
mulae analyzed here lose viability when ttIt^L becomes 
too small. Precisely at what values of rriT^L this takes 
place is not well controlled theoretically, however. Direct 
simulations on a larger lattice is required to pin down 
the actual magnitude of finite-size effects at the physical 
point. The need for such calculations are even more for 
baryons whose sizes are larger than mesons. 



V. RESULTS AT THE PHYSICAL POINT 

We need three physical inputs to determine the up- 
down and the strange quark masses and the lattice cut- 
off. We choose TO,r, 'tuk and toq. The choice of toji has 
both theoretical and practical advantages: the f2 baryon 
is stable in the strong interactions and its mass, being 
composed of three strange quarks, is determined with 
good precision with small finite size effects. 

For the pseudoscalar meson sector, we employ SU(2) 
chiral expansion as explained in the previous Section. For 
the vector mesons and the baryons we use a simple lin- 
ear formula mhad = ah + /3/iTO„^^ -t- -fhmf^^, employing 
the data set in the same range n^d > 0.13754 as for the 
pseudoscalar meson sector. We do not rely on heavy me- 
son effective theory (HMET) ^ or heavy baryon ChPT 
(HBChPT)[4J| since they show very poor convergences 
even at the physical point [Hi. In Figs. [201 [Hj and [H 
we show linear chiral extrapolations of the vector meson, 
the octet and the decuplet baryon masses, respectively. 
Blue symbols represent the fit results at the measured 
values of to,^^^. The extrapolated values at the phys- 
ical point are also denoted by blue star symbols, which 
should be compared with the experimental values plotted 
at m^wi = 0. 

Since the linear fit is applied to the data set at K^d > 
0.13754, blue symbols at Kud < 0.13754 express the pre- 
dictions from the fit results. We observe that the quark 
mass dependence of toq is remarkably well described by 
the linear function, which assures that toq is a good 
quantity for the physical input in the sense that its chiral 
behavior is easily controlled. 

The results for the physical quark masses and the lat- 
tice cutoff are listed in Table 1x1 where the errors are sta- 
tistical. They are provided with and without the finite 
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size corrections based on the NLO SU(2) ChPT analy- 
ses. Both results are almost degenerate. We find that our 
quark masses are smaller than the estimates in the recent 
2+1 flavor lattice QCD calculations [33, 140] ■ We note, 
however, that we employ the perturbative renormaliza- 
tion factors at one-loop level which should contain an 
uncertainty. A nonperturbative calculation of the renor- 
malization factor is in progress using the Schrodinger 
functional scheme. 

In Table |X] we also present the results for the pseu- 
doscalar meson decay constants at the physical point 
using the physical quark masses and the cutoff deter- 
mined above, which should be compared with the ex- 
perimental values/^ = 130.7 MeV, Jk = 159.8 MeV, 
fK/f-K — 1.223[4y|. We observe a good consistency 
within the error of 2-3%. The ratio is 3% smaller than 
the experimental value in the case of the SU(2) ChPT 
fit with the finite size corrections. A nonperturbative 
calculation of Za is also in progress. 

In Fig. [53] the light hadron spectrum extrapolated to 
the physical point using SU(2) ChPT with the finite size 
corrections are compared with the experimental values. 
Numerical values with and without the finite size cor- 
rections are listed in Table IXII The largest discrepancy 
between our results and the experimental values is at 
most 3%, albeit errors are still not small for the p meson, 
the nucleon and the A baryon. The results are clearly 
encouraging, but further work is needed to remove the 
cutoff errors of ©((aAqco)^)- 



VI. STATIC QUARK POTENTIAL 

In addition to the hadronic observables presented so 
far, we also calculate the Sommer scale which is a pop- 
ular gluonic observable. In order to calculate the static 
quark potential we measure the temporal and the spatial 
Wilson loops with the use of the smearing procedure of 
Ref. [43| • The number of smearing steps is determined to 
be 20 after examining the sufficient overlap of the Wil- 
son loops onto the ground state. The potential V{r) is 
extracted from the Wilson loops applying a correlated fit 
of the form 



W{r,t) = C(r)exp(-F(r)t), 



(77) 



where the same fitting range [tmjn , imax] — [5,8] is cho- 
sen for all the simulations after investigating the effective 
potential 



Fcff(r,t)=ln 



W{r,t) 



W{r,t + l) 



(78) 



FigurclMlshows a typical case of Vcs{r, t) with r = 4,8, 12 
at Kud = 0.13770. We find that plateau starts a,t t — 4 
and signals are lost beyond t = 7. A result of V{r) at 
Kud = 0.13770 is plotted in Fig. [23 as a representative 
case. Since good rotational symmetry and no sign of the 



string breaking are observed, we employ the following 
fitting form for the potential: 



V{r) = Vo h err. 



(79) 



where Vq, a, a are unknown parameters. The fitting 
range is [rniin,ri„ax] = [3, 16]. 

The Sommer scale rg is a phenomenological quantity 
defined by 



dVir) 



1.65. 



(80) 



r=ro 



1.65- 



dr 



Given Eq. ([79|l we obtain 



To 



In Table IXIII wc list the results for tq including the sys- 
tematic errors due to the choices of imin a-nd r^^i^- 

At (Kud,Ks) = (0.13700,0.13640), our result is com- 
pared with those of CP-PACS/ JLQCD Q in Table[XllT] 
The two results are in reasonable agreement given the siz- 
able magnitude of systematic errors caused by the short- 
ness of plateau of effective masses for potentials. 

In order to extrapolate tq to the physical point we em- 
ploy a linear form 1/ro — ar + Pr ■ "^udT^ + Ir ■ mf^^ 
for the data set at Kud > 0.13754. We illustrate the 
chiral extrapolation in Fig. [551 where the fit results are 
plotted by red triangles at the measured values of rn^^^. 
The extrapolated result of tq at the physical point is 
5.427(51)(+81)(-2), which is 0.4921(64)(-f74)(-2) fm 
in physical units with the aid of a~^ = 2.176(31) GeV. 
The first error is statistical and the second and the third 
ones are the systematic uncertainties originating from the 
choice of imin and r^^ , respectively. 



VII. CONCLUSION 

We have presented the first results of the PACS-CS 
project which aims at a 2-1-1 fiavor lattice QCD sim- 
ulation at the physical point using the 0(a)-improved 
Wilson quark action. The DDHMC algorithm, cou- 
pled with several algorithmic improvements, have en- 
abled us to reach m^ — 156 MeV, which corresponds 
to ni^in = 2 GeV) =3.6 MeV. We are almost on the 
physical point, except that the strange quark mass is 
about 20% larger than the physical value. 

We clearly observe the characteristic features of the 
chiral logarithm in the ratios m'^/m^^^ and Jk/ f-n- We 
find that our data are not well described by the NLO 
SU(3) ChPT, due to bad convergence of the strange 
quark contributions. We instead employ the NLO SU(2) 
ChPT for 771^ and /.„■, and an analytic expansion around 
the physical strange quark mass for mx and fx in order 
to estimate the physical point. The low energy constants 
obtained in this way are compatible with phenomenolog- 
ical estimates and other recent lattice calculations. 
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Thanks to the enlarged physical volume compared to 
the previous CP-PACS/JLQCD work, we obtain good 
signals not only for the meson masses but also for the 
baryon masses. After linear chiral extrapolations of the 
vector and baryon masses the hadron spectrum at the 
physical point shows a good agreement with the exper- 
imental values, albeit some of the hadrons have rather 
large errors and scaling violations remain to be examined. 
We find smaller values for the physical quark masses com- 
pared to the recent estimates in the literature. This may 
be due to the one-loop estimate of the renormalization 
factor. 

At present the simulation at the physical point is un- 
der way, and the statistics of the run at n^d — 0.13781 is 
being accumulated. We are evaluating the nonperturba- 
tive renormalization factors for the quark masses and the 
pseudoscalar meson decay constants in order to remove 
perturbative uncertainties. 

Once these calculations are accomplished, the next 
step is to investigate the finite size effects at the physi- 
cal point, and then to reduce the discretization errors by 
repeating the calculations at finer lattice spacings. 



Kud < 0.13770 runs. 



1. Domain decomposed HMC effective action 

In this work we employ the 0(a)-improved Wilson 
fermions. Before applying the domain-decomposition 
preconditioning for the quark determinant, we first apply 
Jacobi preconditioning to split the local clover term. The 
0(a)-improved Wilson-Dirac operator D is expressed as 



D^l + T + M, 



(Al) 



where T is the local clover term, M is the hopping term. 
Jacobi preconditioning transforms the up-down quark de- 
terminant I det [D] I ^ to 



I det[D]\^ ^ I det[l + T]\^\ det[i)]p 



(A2) 



where D = 1 + {1 + T)-^M = 1 + M. By splitting lattice 
sites into even and odd domains, D has the following 2x2 
blocked matrix form, 
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APPENDIX A: DDHMC ALGORITHM 

In Appendix|Xl we describe our implementation details 
of the Liischer's DDHMC algorithm[ll] employed for our 



D 



Dee Deo 
Doe Doo 



(A3) 



where the suffix E (O) means the even (odd) domain. 
Applying the domain decomposition preconditioning for 
this form, we obtain 

\det[D]\'' = \det[l + T]\^\det[DEE]\^\det[Doo]\^ 

x\det[DEE]\^ (A4) 

where Dee is the Schur complement of D and expressed 
as 



Dee = 1 - {DEEr'DEoiDoor^DoE- 



(A5) 



Our domain decomposition is based on the four dimen- 
sional checkerboard coloring. 

The operator Dee can be further preconditioned by 
the spin and hopping structure because Deo (Deo) only 
connects domain surface sites. Let p^^™ (P^^™) be the 
spin and site projection operator to the even (odd) do- 
main sites, 



p^pm,/, _ 






if 71 is located on the bulk site of the even domain, 

^(1 + 7/^)V'n if f^ is located in the even domain and n + fi is in the odd domain with one value of /i only, 

^(1 — j^Jipn if n is located in the even domain and n — /i is in the odd domain with one value of /i only, 

ipn otherwise. 

, ^6) 

I 



This projection operator satisfies the following relations. and the same relation holds for the odd domain case. 



spin 



Doe = DoePT", 



(A7) 
(A8) 



With these properties, Dee satisfies 



^EE — ^ ^ ^E + ^EE^E 



(A9) 



This means that Dee is a triangular matrix in view of 
the projection space. Thus we have 



det[DEE] = det[Pf "i^BfiPf ™], 



(AlO) 



where the matrix dimension of the operator 
pspinj~^^^pg)m jg effectively reduced. We define 



^EE = ^E ^EE^E ' 



(All) 



for the reduced operator. 

Since the domain block lattice extent we use is 8 and 
is an even number, the domain restricted operator Dee 



J 
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(Doo) can be further preconditioned by the even-odd 
site preconditioning which is widely used for full lattice 
case in the literature. 

det[DE£;] = det[{DEE)ee], (A12a) 

{DEE)ee = I - {MEE)eo{MEE)oe, (A12b) 

where the suffices eo and oe mean hopping from an odd- 
site to an even-site and vice versa. For the odd domain 
operator Dqo the same relation exists. Our even-odd site 
preconditioning is based on the four dimensional checker- 
board coloring again. 

After applying all these preconditioning we obtain the 
following lattice QCD partition function for degenerate 
up-down quarks. 



Z = 



H[P,U,(bEe,cboe,XE] = ^Tt[P^] + Sg[U] + Scw[U] + ^ S^vv ,x[U, ^Xe] + S^m[U, xe], (A13b) 



X = E,0 



r 



where P is the canonical momenta for U, Sg[U] the gauge 
action, and 

S,w[U] = -21og[det[(l-fT)]], (Al4a) 

' , (A14b) 



SqUV,X [U, 4>Xe 
SqlK \U, XE 



{{Dxx)ee)'^C^Xc 
2 



/jjspm-^ 



'EE 



XE 



(A14c) 



where xe is projected so as to satisfy P^™xe = Xe- Our 
DDHMC algorithm is based on this partition function 
and the UVPHMC algorithm for strange quark is simply 
added to this form. 



2. Multi time scale molecular dynamics integrator 

We employ the Sexton- Weingarten multiple time scale 
molecular dynamics (MD) integrator [23|. The ordering 
to evolve link variables and momenta is arbitrary in the 
simple leap-frog integrator, and it is known that the so- 
called QPQ-orderin g ha s better performance than that 
of the PQP-ordering[l^|4^, [50]. While an actual perfor- 
mance comparison is not made systematically, this leads 
us to implement the QPQ-ordered multi time step inte- 
grator expecting better performance. 

Suppose that there is a Hamiltonian H expressed as a 
sum of N potentials: 



N-l 



H = T{p) + j2yM, 



(A15) 



where q represents dynamical variables, T(p) the kinetic 
term p^/2, and p is the canonical momenta. This leads 
to the following equation of motion: 

q = p^-{H,q}p, (A16a) 

Af-1 

p ^ F^ ~{H,p}p - Y. ^*' (A16b) 



j=0 



{X,Y}p 



-^ = ~-{V.,p}p, 

dX BY dX dV 
dq dp dp dq ' 



(Al6c) 
(A16d) 



where {X, F}p is Poisson bracket, and the dot ' is the 
abbreviation for the time derivative d/dr. The formal 
solution is written as 



P 



(r) = cxpItLh} 



(0), 



(A17) 



where exp{r_L/f} is the exponentiation of the Liou- 
villean LhX = —{H, X}p. In our case LhX — 
LtX + J2f=o^ Lv^X, where LtX = -{T,X}p and 
Ly.AT — ~{Vi,X}p. We assume that the numbering 
of the potential Vi is ordered so as to satisfy \Fi\ < 
\Fi-i\. Any molecular dynamics integrator is an ap- 
proximation/decomposition of the operator exponential 
exp{rL/f} using expjrLy} and exp{TLvi}- 

To explain symplectic molecular dynamics integrators 



we introduce the following mapping: 

Q{6t) = cxpItLt} : {q,p) -^{q + STp,p), (Al8a) 
P,{6t) = expjrLvJ : {q,p) ^ {q,p + STF,){AlSh) 
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Using these operators we can derive the following multi 
time scale integrators. 



J 



a. PQP-ordering The PQP-ordered multi time step integration operator S ^ (t) is defined as 

5PQP(r, (iVo,^i, • • ■,Nn-i)) = Sl'f.iT, {No,N,, . . . ,iV^_i)), 
where Sj^_^ is recursively defined as 

-\Ni 



(A19) 



jPQP 



S^'^^iT,iNo,N,,...,N,)) = 



-PQP 



r^V^PQPf^ 



^^l2]V-j^- U'(^"'^---'^-)j^H2iV- 



S^^^{r,No) ^ 



^° ' 2^) "^ (i) "^^ (^ 



No 



(A20) 



where Ni is the step number for each time scale. The momentum is updated by SriFi with Sri — t/(Y[j=o i Nj) at 
depth i. 

b. QPQ-ordering The QPQ-ordered multi time step integrator is used for our productive runs. The QPQ-ordered 
integrator 5"^^^ is defined as 



5QPQ(r, (A^o, m,..., Nn-i)) - SfJ^iT, {No, A^i, . . . , A^at-i)), 
where S^_{ is recursively defined as 



(A21) 



^-QPQ 



^r''(r,(A^o,...,Af.)) 



A,, 



SZ"" ( ^, (^0, . . . , N.^,) ]p4^] sZ"" ( ^, (No, . . . , A^,:_i) 



;:QPQ 



S^^^ir,No) ^ 



n2kr° i <4 



N,, 

«0 



A^,; 



Kl+<5.,„_i) 



(A22) 



In this case the division numbers, Ni, should be chosen from even numbers except for the outermost division number 

A^jv-i. , 

I 



The integrator described above is based on the nesting 
of the simple leap-frog integrator. We also note that we 
have not yet tried the so-called Omelyan integrator |50l 
ISlj for the recurrence kernel, albeit it is generally known 
to be a better scheme and may be used for our case. The 
multi time step integrator with the Omelyan kernel has 
been used in Refs. [33, [521 • 



3. UV part solver 

The UV part of the HMC algorithm is governed by 
the action Eq. (|A14b[) . This contains the inversion of 

{DEE)ee and {Doo)ee- In our parallel implementation of 
the algorithm, each block lattice is completely contained 
in a single node. This means that there is no ghost site 
exchange for multiplying Dee- In this case SSOR pre- 
conditioning with natural site orderin g is more efficient 
than the even-odd site preconditioning [53j. 



We solve the linear equation 

{DEE)eeXe = ^e 



(A23) 



using SSOR preconditioned GCR solver where Xe and b^ 
carry the even-site data in the even-domain. We imple- 
mented the SSOR prcconditioncr with single precision 
arithmetic. 

To solve Eq. (|A23p with an SSOR preconditioner, we 
transform Eq. (jA23[) back to the unpreconditioncd form. 



DeeV 



Ve 



(A24a) 
(A24b) 

(A24c) 



where j/e is the even-site components of the full even- 
domain vector y. The right hand vector c has zero for 
the odd-site components and has b^ for the even-site com- 
ponents. 
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We make use of the block/domain independence among 
the computational nodes and matrix structure of the do- 
main operator Dee to solve Eq. (jA24a|. With the natu- 
ral site-ordering in each block, Dee can be decomposed 



Dee = l-L-U, 



(A25) 



where L is the forward hopping term and U is the 
backward hopping term. The L and U are strictly 
triangular for the natural site ordering because of the 
Dirichlet boundary condition for each block in a domain. 
Eq. (|A24a|) is solved by 

d^{l-ujLy^c, (A26a) 

{Dee)ssokMssokz = d, (A26b) 

y = (1 - ujU)-^Mssorz, (A26c) 

where u is an over-relaxation parameter to be tuned, 
(£'_ee)ssor and Mssor are defined by 



(Dee) 



1 



SSOR 



[{I - ujL)-^ + {1 - LuU)-^ 

(lu - 2)(1 - LoL)-\l - ujU)''^] , 

(A27) 



JVssOR r 

-^^ssoR =2^ ( 1 - {Dee)ssor] 



(A28) 



32bit 



The preconditioner A/ssOR is computed in single preci- 
sion, and Eq. (|A26b[) is solved using GCR solver to dou- 
ble precision. The inverses (1 — ujL)~^ and (1 — ojU)"^ 
are easily solved by forward and backward substitutions, 
and Eq. (|A27|) is computed using the Eisenstat trick. The 
parameter uj is tuned to '^ 1.2 and A^'ssoR to 5 ~ 10 to 
achieve optimal performance. The maximal Krylov sub- 
space dimension A^kv for GCR solver is chosen to avoid 
frequent restarting and residual stagnation, and our ex- 
perience tells that A'kv ^ 0(10) is sufficient. 



as 



XE 


= PT'^ibE-VE), 


(A30a) 


Dy 


= w, 


(A30b) 


w 


/ \ 
V DoEbE J ' 


(A30c) 



where i/e is the even domain component of y, and w is the 
full lattice vector for which the even domain components 
are set to zero. 

Eq. (lASObp is efficiently solved with the GCR-SAP 
solver 291 via 



DMsApz 

y 



w, 
MsApz. 



(A31a) 
(A31b) 



The SAP preconditioner Msap is computed in single pre- 
cision as 



AfsAP 



NsAP 

X ^ (1 - DKy 

3=0 



(A32) 



32bit 



where 



K = 



(A33a) 



Aee 

-AqoDoeAee Aqo 
Aee = {1-ujU)-^Mssop{1-ujL)-\ (A33b) 



and Aqo similar to Aee- The operator Aee {Aqo) is 
the approximation for {Dee)~^ {{Doo)~^) via the SSOR 
fixed iteration AfssoR defined in Eq. (jA28|) . 

Thus the solver for Eq. (|A29P contains several tun- 
able parameters; uj, NssoR, A"sap, and A'kv the maxi- 
mal Krylov subspace dimension for GCR. We observed 
that UJ ^ 1.2, AssoR = 1, A^sap = 10 ~ 20 and 
Akv = 40 ~ 100 show satisfactory performance. 



5. Dead/alive link method 



4. IR part solver 

The IR part of the DDHMC algorithm contains the 
linear equation as 



D£e ^e = bi 



(A29) 

Eq. (|J29l) is 
P^^^)xe — and 



where Dee i^ defined by Eq. (JA11|1 
solved with the restrictions (1 
(1 - Pf '")6b = 0. 

As described in Ref. [ll|, directly solving Eq. (|A29|) 
is rather slow because the operator D'^ee^ contains the 
domain inversions {Dee)~^ and (Doo)^^ with double 
precision. Instead of solving Eq. (jA29p . the solution xe 
can be expressed using the unpreconditioned operator D 



Liischer's DDHMC algorithm was originally proposed 
for the plaquette Wilson gauge action and the unim- 
proved Wilson fermion 11]. He restricted the link vari- 
ables evolved by the MD integrator to a subset. The 
link variables which connect the domain interfaces and 
are located parallel to the domain surfaces are kept fixed 
during the MD evolution (dead links), and only the re- 
maining bulk links are evolved (alive links). The choice 
of the set of dead links are dictated by the condition 
that the alive links are decoupled. The method has the 
benefit that if the layout of the domain decomposition 
is properly matched to the compute node location there 
is no need to exchange link data during the MD evolu- 
tion. Thus the algorithm becomes a semi-local update 
algorithm. To ensure the ergodicity a random parallel 
translation of the lattice coordinate origin is required af- 
ter each HMC evolution. 
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In our case we employed the Iwasaki-gauge action and 
the 0(a)-iniproved Wilson fermion. These actions have a 
larger lattice extent compared to the unimproved action 
(the rectangular part of the gauge action and the clover 
term of the fermion action), and one may worry about the 
semi-locaUty of the MD evolution. Since the extension is 
still within two sites, we can conclude that the dead links 
are still domain connecting ones and those on the thin 
surface of the domains. Thus we can apply the same 
dead/alive link method as that for the unimproved case. 
For more extended gauge or fermion action the number 
of dead links should be enlarged to decouple the active 
links. 

The efficiency of the dead/alive link method depends 
on the ratio of the number of active links and all links, 
which is estimated as 



into two pieces as 



{NB~l){NB-2f/N% 



(A34) 



where Nb is a domain block size assuming N'j^ blocking. 
In this paper we employed Nb = 8, which results in 
~37% for the ratio. We employed the same algorithm 
for the random parallel translation as in Ref. [11]. 



APPENDIX B: MASS PRECONDITIONED 
DDHMC (MPDDHMC) ALGORITHM 

As the up-down quark mass is reduced toward the 
physical point, we observed strong MD instability with 
the DDHMC algorithm. The origin of the instability is 
the appearance of near zero or negative eigenvalues in 
the D spectrafSJ, [S^]- The corresponding eigenmodes 
yield a strong MD force and large fluctuations for the IR 
action (jA14cl) as described in the main text. We could 
handle the instability by reducing 5tie.- However this 
results in very high values of the HMC acceptance, e.g., 
> 90%, which is unnecessarily large compared with the 
optimal acceptance ratio, e.g., ~ 60-70% for 2nd order 
MD integrator |56j. 

We introduce Hasenbusch's heavy mass 
preconditioner |13l. [l4| to stabilize the IR part (jA14cp . 
and call the resulting algorithm MPDDHMC algorithm. 
We also implement several improvements in the algo- 
rithm. Our simulation with the lightest up-down quark 
mass corresponding to Kud — 0.13781 is finally carried 
out by the MPDDHMC algorithm. Here we describe the 
implementation details of the MPDDHMC algorithm. 



1. Hasenbusch's heavy mass preconditioning for 
DDHMC algorithm 

The mass preconditioner is introduced for the IR part 
action Eq. (|A14cp . The action is transformed and split 



det[D 



EE 



det 



r\spin 

^ spin 
^ EE 






\det[REE]\^ det[D''EE' 



(Bl) 



where the primed operator D' be is defined with the 
modified hopping parameter k' = pn keeping the clover 
term unchanged. Introducing the pseudo-fermion fields 
for each determinant we obtain 



det[Df"i 



'EE 



Vx^e'DxeVcIvQe 



xe "^9™ 



[UXe 



'[U,XB 



^qlK 



\{Ree) C,e\ 



s. 



qlR' 



[d'ee y 



XE 



(B2a) 
(B2b) 

(B2c) 



The action Eq. (IAl4cP is replaced by Eqs. (IB2bp and 
(|B2c| . Our MPDDHMC algorithm is based on this ac- 
tion. 

The parameter p is a tunable parameter and should be 
chosen close to but less than p = 1 while keeping R ^ 1 
so as to achieve optimal performance. For example, since 
the DDHMC simulation at k = 0.13770 ran successfully, 
we use p — 0.9995 at Kud = 0.13781 since we expect k' ^ 
0.13770 would lead to a stabilized behavior for Eq. (IB2c|l . 



2. Solver improvements 

As the quark masses are taken small, we encountered 
a solver stagnation or failure due to the presence of the 
near zero modes or the negative (real part) eigenmodes. 
In this case the GCR-SAP solver sometimes does not con- 
verge. Although this difficulty could be cured by chang- 
ing the solver algorithm or finely tuning the solver pa- 
rameters, i.e. u>, A'ssoRj ete., applying such remedies 
causes violation of the reversibility of the MD evolution 
when a loose stopping condition is adopted for the solver. 

To avoid this situation we decided to change the solver 
algorithm. Our strategy is to combine (1) a strict stop- 
ping condition, (2) applying the method of chronolog- 
ical guess, and (3) adopting a solver algorithm robust 
against near zero and negative eigenvalues. The use of 
strict stopping condition (1) gives us room to flexibly 
change the solver algorithm without the reversibility vi- 
olation, although this adds an extra computational cost. 
A part of the extra cost can be reduced by optimizing 
the choice of the initial vector (the chronological guess 
method [l^). It is also required to adopt a solver algo- 
rithm which is robust and fast against the ill conditioned 
case. We employ the inner-outer solver strategy and the 
deflation technique [53, |53, [5^ aiming for speed up and 
taming the difficult eigenmodes. 
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a. Inner-outer strategy 

The gap between the rapidly increasing floating point 
capabiUty of processors and the memory bandwidth is 
spreading because of the rather slow development of 
memory speed. To fill the gap the mixed precision or the 
inner-outer nested solver strategy has been proposed|60j. 
The outer solver must have the property that the pre- 
conditioner can be changed from iteration to iteration. 
Since the preconditioner can be replaced by another it- 
erative solver to make an approximation for the outer 
problem, the preconditioner can be called as the inner- 
solver for the outer solver. The inner-outer solver en- 
ables us the use of single precision which effectively dou- 
bles the memory bandwidth, data cache size, and pro- 
cessor registers |6ll|. The GCR-SAP solver proposed by 
Liischer[23| is also along this strategy. If the solver 
parameters can be chosen such that most of the com- 
putational time is spent in the inner-solver, we re- 
ceive a maximal benefit from the use of single precision 
arithmetic [6l|. 

In this work we developed a version of the BiCGStab 
algorithm which enables us to follow the inner-outer 
strategy |63|. The benefit of BiCGStab compared to GCR 
(or GMRES) type algorithms is that BiCGStab has a 
shorter recurrence iteration, small memory requirement, 
and no restarting. To make the BiCGStab solver flexible 
against substitutions of the preconditioner, we slightly 
modify the algorithm. The point of modification is the 
following. 

Any solver algorithm which has the following update 
point for the solution and residual vector can be modified 
to take the inner-outer solver form. To solve Ax = b, 
suppose that an algorithm has the lines 

[compute parameter a and pre-search vector p.] 

q = Ap, (B3a) 

r = r — aq, (B3b) 

X — X + ap, (B3c) 

where the method to obtain a and p depends on the outer 
solver algorithm. To enable a flexible preconditioner re- 
place these lines as 

[compute parameter a and pre-search vector p.] 

V = Mp, (B4a) 

q = Av, (B4b) 

r = r — aq, (B4c) 

X — X + av, (B4d) 

where M is a preconditioner and must be an approxima- 
tion for A~^ . The extra vector v is required to hold an in- 
termediate vector. In this modification the search vector 
q is produced for the equation AMy = b, while the solu- 
tion still keeps the solution-residual relation r = b — Ax 
of the unpreconditioned equation. The preconditioner M 
can be changed from iteration to iteration in the outer 



solver, as far as the the solution- residual relation is kept 
intact. In this way, a flexible preconditioner can be in- 
troduced for the outer algorithm. 

This modification is applicable to many solvers which 
have similar local update points (CG, MR, CGS, etc.). 
The iterative refinement or Richardson iteration[6l| is 
the simplest example. Solvers of GMRES type is also 
modified along this strategy. Longer recurrence relations 
of those algorithms require a series of extra vectors such 
as the above vector v (for ex. GCR, FGMRES), however. 

We implement this modification to the BiCGStab 
solver and replace two update points. The preconditioner 
M is replaced by the single precision solver for Ax = b 
with the appropriate precision conversion interface (sin- 
gle to double and vice versa) . The tolerance of the inner 
solver can be relaxed as the outer residual approaches 
the desired tolerance, and this also reduces the cost of 
the inner-solver. We use the following tolerance control 
method for the inner-solver. 



tohnner = mill | HiaX | — — , 10 



10 



-3 



(B5) 



where errouter is the relative residual norm |6— Ax|/|6| for 
the outer solver. When the residual gap to the desired 
tolerance is larger than 10^^, the inner solver is called 
with 10~^ tolerance which is the limit of single precision 
arithmetic. As the outer residual decreases the inner- 
solver tolerance is relaxed. 

The flexible BiCGStab algorithm is applied to both the 
IR and the UV problems. We solve 



iD{Mz) = w, 



(B6) 



with a flexible preconditioner M for the IR problem 
Eq. (|A30b|) (heavy mass k! version is also modifled), and 



{Dee)ssok{Mz) = d, 



(B7) 



with a flexible preconditioner M for the UV problem 
Eq. (|A26bp . With these setup the flexible BiCGStab calls 
the inner solver 3 to 5 times to obtain the double preci- 
sion solution. 



6. Inner solver and deflation technique 

Because the outer solver is well preconditioned by an 
inner-solver, the residual stagnation or convergence fail- 
ure should not take place. However the problem of the 
near zero modes still remains and is left to the inner- 
solver to handle. 

As explained in the main text we use the combina- 
tion of BiCGStab and GCRO-DR^ solvers. The in- 
ner solver usually uses BiCGStab. If residual stagna- 
tion or breakdown is detected the solver restarts with 
the GCRO-DR algorithm. The use of GCRO-DR is the 
key point to handle the ill conditioned problem in our 
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algorithm. The GCRO-DR incorporates the so-called de- 
flation technique which removes or deflates the ill condi- 
tioned eigenmodes from the matrix spectrum as has been 
described in the literature |57l ISSl [59^ . 

GCRO-DR solver has the following properties; (1) 
solves a linear equation and its eigensubspace simulta- 
neously, (2) deflates the eigenmodes from the coefficient 
matrix and reduces the condition number, (3) can recycle 
the eigenmodes for another linear equation with the same 
or perturbed coefficient matrix but different right-hand 
vectors. 

Since the inner-solver is to be called several times by 
the outer-solver and the outer-solver is to be called many 
times during the MD evolution, the property (2) might 
largely help to solve the ill conditioned problem. The 
property (3) opens the possibility of reusing the defla- 
tion subspace among the MD evolution steps for possible 
further speedup. 

Unfortunately the performance of GCRO-DR algo- 
rithm highly depends on the problem to be solved, and 
we observed that the overhead is large compared to nor- 
mal BiCGStab for well conditioned cases. One may con- 
sider reusing the deflation subspace generated by GCRO- 
DR for the so-called deflated BiCGStab (D-BiCGStab) 
algorithm. However, for well conditioned cases the over- 
head is still rather large and no improvement is observed. 
Moreover we observed that the rate of the occurence of ill 
conditioned cases is low. We, therefore, use the normal 
BiCGStab algorithm for a flrst attack, and switch the 
solver to GCRO-DR only when the stagnation or break- 



down is detected as described above, otherwise continue 
to use the un-deflated BiCGStab. Once the inner solver 
is switched to the GCRO-DR solver, GCRO-DR is kept 
being used until the outer iteration converges. If there is 
another linear equation with the same ill-conditioned co- 
efficient matrix in the MD force calculation, GCRO-DR 
continues with recycling the deflation subspace. 

The actual equation to be solved by the inner solver 
is as follows. For the IR problem Eq. (jB6p to obtain 
t ~ Mz ^ {D)~^z, we use 



DKs = z, 
t = Ks, 



(B8a) 
(B8b) 



where Eq. ((B8a| is solved by BiCGStab or GCRO-DR 
algorithms, and computation are entirely done with sin- 
gle precision. The deflation subspace is spanned for DK 
when switching occurs. Similarly we solve 



(Dee) 



EEJSSOR 



t = Z, 



(B9) 



for the UV problem Eq. (|B7)) to obtain t ^ Mz ~ 

{Dee)ssor^- 

The parameters for the GCRO-DR algorithm is the 
maximal dimeinsion of Krylov subspace iVxv and the di- 
mension of deflation/recycling subspace A^rec ■ The ini- 
tial value is set to (A^kv, A^rec)=(40,20), and it is auto- 
matically enlarged when slow convergences are observed. 
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TABLE I: Simulation parameters. MD time is the number of trajectories multiplied by the trajectory length r. rint[-P] denotes 
the integrated autocorrelation time for the plaquette. CPU time for unit r using 256 nodes of PACS-CS is also listed. 



«:ud 


0.13700 


0.13727 


0.13754 


0.13754 


0.13770 


0.13781 


K.B 


0.13640 


0.13640 


0.13640 


0.13660 


0.13640 


0.13640 


#run 


1 


1 


2 


4 


2 


5 


r 


0.5 


0.5 


0.5 


0.5 


0.25 


0.25 


(iVo,iVl,iV2,iV3) 


(4,4,10) 


(4,4,14) 


(4,4,20) 


(4,4,28) 


(4,4,16) 


(4,4,4,6) 
(4,4,6,6) 


p 


- 


- 


- 


- 


- 


0.9995 


Npoly 


180 


180 


180 


220 


180 


200 


Replay 


on 


on 


on 


on 


on 


off 


MD time 


2000 


2000 


2250 


2000 


2000 


990 


(^> 


0.569105(18) 


0.569727(14) 


0.570284(16) 


0.570554(17) 


0.570573(20) 


0.570868(9) 


(e-^«) 


0.9922(85) 


1.0016(50) 


1.0013(56) 


0.9993(36) 


0.9944(53) 


0.970(12) 


Pacc(HMC) 


0.8020(63) 


0.8672(47) 


0.8573(52) 


0.9140(44) 


0.8397(41) 


0.8033(63) 


Pacc(GMP) 


0.9529(37) 


0.9439(34) 


0.9331(40) 


0.9330(41) 


0.9537(26) 


0.9670(32) 


rint[P] 


8.6(3.1) 


20.9(10.2) 


9.8(2.8) 


6.3(1.4) 


25.2(15.2) 


2.9(1.9) 


CPU hour/unit r 


0.29 


0.44 


1.3 


1.1 


2.7 


7.1 



TABLE II: Smearing parameters A and B for the ud and s quark propagators. 



^ud 


Ks 


#source 


Aud 


Bud 


As 


B, 


0.13700 


0.13640 


1 


1.2 


0.21 


1.2 


0.28 


0.13727 


0.13640 


1 


1.2 


0.19 


1.2 


0.25 


0.13754 


0.13640 


4 


1.2 


0.17 


1.2 


0.25 


0.13754 


0.13660 


4 


1.2 


0.17 


1.2 


0.25 


0.13770 


0.13640 


4 


1.2 


0.09 


1.2 


0.21 


0.13781 


0.13640 


4 


1.2 


0.07 


1.2 


0.20 
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TABLE III: Meson and baryon masses in lattice units at each combination of Kud and Kg. x^/dof for the fit is also presented 
in the second row of each channel. The fit range is [13, 30] for pseudoscalar mesons, [10, 20] for vector mesons, [10, 20] for octet 
baryons and [8, 20] for decuplet baryons. 



/^ud 


0.13700 


0.13727 


0.13754 


0.13754 


0.13770 


0.13781 


Ks 


0.13640 


0.13640 


0.13640 


0.13660 


0.13640 


0.13640 


TT 


0.32242(65) 


0.26191(73) 


0.18903(79) 


0.17671(129) 


0.13593(140) 


0.07162(299) 




0.015 


0.008 


0.002 


0.001 


0.001 


0.004 


K 


0.36269(61) 


0.32785(74) 


0.29190(67) 


0.26729(110) 


0.27282(103) 


0.25454(97) 




0.016 


0.015 


0.002 


0.001 


0.001 


0.025 


»?ss 


0.39947(58) 


0.38380(74) 


0.36870(71) 


0.33490(93) 


0.36289(103) 


0.35306(82) 




0.017 


0.015 


0.000 


0.002 


0.001 


0.016 


p 


0.5060(30) 


0.4566(36) 


0.4108(31) 


0.3963(53) 


0.3895(94) 


0.3503(315) 




0.043 


0.229 


0.017 


0.090 


0.005 


0.418 


K* 


0.5314(23) 


0.4954(32) 


0.4665(23) 


0.4428(37) 


0.4525(35) 


0.4316(47) 




0.088 


0.068 


0.007 


0.014 


0.003 


0.092 





0.5560(17) 


0.5325(28) 


0.5156(21) 


0.4849(26) 


0.5105(26) 


0.4949(15) 




0.124 


0.015 


0.002 


0.007 


0.001 


0.026 


N 


0.7277(22) 


0.6487(56) 


0.5584(53) 


0.5331(71) 


0.5025(87) 


0.4285(360) 




0.077 


0.027 


0.358 


0.014 


0.171 


1.138 


A 


0.7557(23) 


0.6913(45) 


0.6208(36) 


0.5857(42) 


0.5764(65) 


0.5240(95) 




0.115 


0.029 


0.089 


0.015 


0.018 


0.154 


S 


0.7606(20) 


0.7039(51) 


0.6437(39) 


0.6052(48) 


0.6044(71) 


0.5601(99) 




0.072 


0.030 


0.041 


0.091 


0.043 


1.377 


'Z^ 


0.7859(25) 


0.7399(43) 


0.6910(30) 


0.6474(32) 


0.6655(46) 


0.6405(31) 




0.139 


0.035 


0.028 


0.020 


0.010 


0.008 


A 


0.8290(42) 


0.7694(84) 


0.6956(66) 


0.6731(86) 


0.6438(90) 


0.5798(378) 




0.046 


0.022 


0.102 


0.038 


0.860 


0.421 


S* 


0.8537(35) 


0.8039(74) 


0.7464(43) 


0.7149(74) 


0.7097(67) 


0.6885(140) 




0.037 


0.010 


0.022 


0.036 


0.179 


0.031 


1^* 


0.8788(30) 


0.8395(67) 


0.7964(41) 


0.7579(60) 


0.7740(58) 


0.7549(67) 




0.036 


0.005 


0.005 


0.035 


0.022 


0.255 


n 


0.9038(29) 


0.8754(61) 


0.8456(37) 


0.8001(49) 


0.8342(52) 


0.8142(34) 




0.050 


0.003 


0.009 


0.032 


0.015 


0.206 



TABLE IV: Quark masses in the MS scheme at the scale of 1/a and pseudoscalar decay constants at each combination of Kud 
and Kg. Both are renormalized at one-loop level. The values for m%/m^^ and fx / fi^ are also listed. 



Kud 


0.13700 


0.13727 


0.13754 


0.13754 


0.13770 


0.13781 


Kb 


0.13640 


0.13640 


0.13640 


0.13660 


0.13640 


0.13640 


am^d' 


0.030753(110) 


0.020834(66) 


0.011028(80) 


0.009666(105) 


0.005644(120) 


0.001609(118) 


amf'' 


0.047142(110) 


0.044674(72) 


0.042355(79) 


0.035571(98) 


0.041285(94) 


0.039913(62) 


ms/niud 


1.5329(20) 


2.1443(39) 


3.841(21) 


3.680(30) 


7.32(14) 


24.8(1.8) 


afn 


0.0898(12) 


0.0853(18) 


0.07481(51) 


0.07262(60) 


0.06973(78) 


0.0656(35) 


afK 


0.0942(13) 


0.0916(15) 


0.08432(56) 


0.08058(40) 


0.08089(57) 


0.0777(13) 


2 ! AWI 


3.708(13) 


3.610(14) 


3.558(16) 


3.542(25) 


3.585(29) 


3.732(73) 


Ik/U 


1.0485(13) 


1.0739(57) 


1.1271(16) 


1.1095(64) 


1.1601(73) 


1.186(48) 
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TABLE V: PACS-CS and CP-PACS/JLQCD results for the hadron masses at (fCud, Hs) = (0.13700, 0.13640). [imiii,imax] denotes 
the fitting range. 



lattice size 



mN 



PACS-CS 


32'^ X 64 


0.32242(65) 


0.5060(30) 


0.7277(22) 


[t^miri) (■maxj 




[13,30] 


[10,20] 


[10,20] 


CP-PACS/JLQCD 


20^ X 40 


0.32247(74) 


0.5157(21) 


0.7337(28) 


[f-min^ CniaxJ 




[9,17] 


[9,15] 


[9,15] 



TABLE VL Results for the low energy constants in the SU(3) ChPT together with the phenomenological estimates and the 
RBC/UKQCD and MILC results, /o is perturbatively renormalized at one-loop level. Z/4,5,6,8 are in units of 10~^ at the scale 
of 770MeV. {uu) and {uu)o are renormalized in the MS scheme at 2 GeV. 





PACS-CS 


phenomenology [37, ^1 


RBC/UKQCD [39] 


MILC [40] 




w/o FSE 


w/ FSE 








aBo 


1.789(34) 


1.778(34) 


- 


2.35(16) 


- 


afo 


0.0534(38) 


0.0546(39) 


- 


0.0541(40) 


- 


/o[GeV] 


0.1160(88) 


0.1185(90) 


0.115 


0.0935(73) 


- 


U/fo 


1.159(57) 


1.145(56) 


1.139 


1.33(7) 


1.21(5) (If) 


m^^Bo[GeV^] 


0.00859(10) 


0.00859(11) 


0.0181 


- 


- 


mP'^BolGeY^] 


0.2550(36) 


0.2534(36) 


0.434 


— 


— 


a (uu)o 


-(0.132(6))^ 


-(0.133(6))^ 


- 


- 


- 


(nM>o[GeV^] 


-(0.286(15))^ 


-(0.290(15))^ 


— 


— 


- (0.242(9) (tlr) (4))' 


L4 


-0.04(10) 


-0.06(10) 


0.00(80) 


0.139(80) 


0.1(3) (1?) 


L5 


1.43(7) 


1.45(7) 


1.46(10) 


0.872(99) 


1.4(2) (t?) 


21/6 — Li 


0.10(2) 


0.10(2) 


0.0(1.0) 


-0.001(42) 


0.3(1) {+_l) 


2Ls - Ls 


-0.21(3) 


-0.21(3) 


0.54(43) 


0.243(45) 


0.3(1)(1) 


xVdof 


4.2(2.7) 


4.4(2.8) 


— 


0.7 


— 



TABLE VIL Results for the low energy constants in the SU(2) ChPT obtained by the conversion from those in the SU(3) ChPT. 
The RBC/UKQCD and MILC results are also given for comparison. / and /o are perturbatively renormalized at one-loop 
level, {uu) and {uu)o are renormalized in the MS scheme at 2 GeV. 



PACS-CS 



v/o FSE 



v/ FSE 



RBC/UKQCD [39] 



MILC [40] 



aB 


1.950(31) 


1.935(30) 


2.457(78) 


af 


0.0582(19) 


0.0588(19) 


0.0661(18) 


/[GeV] 


0.1263(51) 


0.1277(51) 


0.1143(36) 


U/f 


1.065(8) 


1.062(8) 


- 


mf^B[GeY'] 


0.009364(36) 


0.009352(34) 


- 


m^^B[GeV^] 


0.2780(52) 


0.2758(49) 


- 


a^{uu) 


-(0.143(3))^ 


-(0.144(3))^ 


- 


(uu)[GeY^] 


-(0.310(9))^ 


-(0.312(10))^ 


- 


h 


3.50(11) 


3.47(11) 


2.87(28) 


h 


4.22(10) 


4.21(11) 


4.10(5) 



1.052(2) 



(0.278(1) itl) (5))' 
1-2(6) t\i) 
4.4(4) (tt) 



B/Bo 

///o 

{uu)/{uu)a 



1.090(15) 
1.089(45) 
1.268(10) 



1.089(15) 
1.078(44) 
1.245(10) 



1.15(5) m 

1.52(17) (t?|) 



TABLE VIII: Comparison of [3,4- 
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group 


#fiavor 


quark action 


ChPT 






/3 


I4 


this work 


2+1 


NP clover 


SU(2) 


w/c 


FSE 


3.23(21) 


4.10(20) 








SU(2) 


w/ 


FSE 


3.14(23) 


4.04(19) 








SU(3) 


w/c 


)FSE 


3.50(11) 


4.22(10) 








SU(3) 


w/ 


FSE 


3.47(11) 


4.21(11) 


RBC/UKQCD[391 


2+1 


DWF 


SU(3) 

SU(2) 






2.87(28) 
3.13(33)(24) 


4.10(5) 
4.43(14) (77) 


MILC[40] 


2+1 


KS 


SU(3) 






1-1(6) (lli) 
3.44(57) (ti) itl') 


4.4(4) (It) 


JLQCD[63] 


2 


Overlap 


SU(2) 






4.14(26) (tf) i+f) 


ETMf64] 


2 


TM 


SU(2) 






3.44(8) (35) 


4.61(4)(11) 


CERNf26] 


2 


Wilson + NP clover 


SU(2) 






3.0(5)(1) 


- 


CGL[6a 


- 





SU(2) 






- 


4.4(2) 


GL[41J 


- 





SU(2) 






2.9(2.4) 


4.3(9) 



TABLE IX: Results for the low energy constants in the SU(2) ChPT fit together with the phenomenological estimates and the RBC/UKQCD results. B = Bb +msBB 
and f — fs + ?Tis/s are given at the physical strange quark mass. / and /o are perturbatively renormalized at one-loop level. Range I, II, III denote the selection of 
data sets corresponding to n^d > 0.13754, Kud > 0.13727, 0.13770 > Kud > 0.13727, respectively, (uu) and {uu)o are renormalized in the MS scheme at 2 GeV. 









PACS-CS 






phenomenology 


RBC/UKQCD [39] 




Rang! 


3l 


Range 


;II 


Range 


III 








w/o FSE 


w/ FSE 


w/o FSE 


w/FSE 


w/o FSE 


w/ FSE 






aB 


1.907(36) 


1.891(35) 


1.941(20) 


1.931(21) 


1.947(20) 


1.942(20) 


- 


2.414(61)(115) 


af 


0.0573(23) 


0.0581(21) 


0.0547(13) 


0.0553(14) 


0.0541(13) 


0.0544(13) 


- 


0.0665(21) (47) 


/[GeV] 


0.1248(51) 


0.1264(47) 


0.1181(30) 


0.1194(31) 


0.1158(28) 


0.1165(28) 


0.1219(7) [66] 


0.1148(41)(81) 


Mf 


1.063(8) 


1.060(7) 


1.074(5) 


1.072(5) 


1.078(5) 


1.077(5) 


1.072(7)[66] 


1.080(8) 


mf,B[GeV'] 


0.009345(27) 


0.009332(26) 


0.009381(16) 


0.009372(17) 


0.009391(17) 


0.009387(16) 


- 


0.00937(57)(64) 


mP'^B[GeV2] 


0.2709(43) 


0.2686(43) 


0.2782(25) 


0.2768(26) 


0.2794(26) 


0.2787(26) 


- 


0.270(16)(18) 


a^{uu) 


-(0.141(3))^ 


-(0.142(3))3 


-(0.138(2))3 


-(0.138(2))3 


-(0.137(2))^ 


-(0.137(2))=* 


- 


- 


{uu)[GeV^] 


-(0.307(8)f 


-(0.309(7)f 


-(0.297(5)f 


-(0.299(5)f 


-(0.293(5)f 


-(0.294(4))3 


- 


-(0.255(8)(8)(13))^ 


h 


3.23(21) 


3.14(23) 


3.32(10) 


3.28(11) 


3.31(10) 


3.30(10) 


2.9(2.4)[41] 


3.13(33)(24) 


h 


4.10(20) 


4.04(19) 


4.32(9) 


4.28(10) 


4.36(9) 


4.34(9) 


4.4(2)[65j 


4.43(14)(77) 


B/Bo 


1.066(15) 


1.064(15) 


- 


- 


- 


- 


- 


1.03(5) 


f/fo 


1.073(55) 


1.065(58) 


— 


— 


— 


— 


— 


1.229(59) 


{uu)/{uu)o 


1.228(13) 


1.205(14) 


- 


- 


- 


- 


- 


1.55(21) 


xVdof 


0.33(68) 


0.43(77) 


2.0(1.0) 


2.3(1.1) 


2.8(1.8) 


3.0(1.8) 


— 


0.3 



to 

OS 
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TABLE X: Cutoff, renormalized quark masses, pseudoscalar meson decay constants determined witti m,^, rriK, rnn inputs. 
Quark masses are renormalized at 2 GeV. 

experiment [46] 



m^i 



[GeV] 

[MeV] 

mf =5 [MeV] 
ms/rriud 
U [MeV] 
Ik [MeV] 



w/o FSE 


ptiysical 


point 


w/ FSE 


2.176(31) 
2.509(46) 
72.74(78) 

29.0(4) 
132.6(4.5) 
159.2(3.2) 
1.201(22) 






2.176(31) 
2.527(47) 
72.72(78) 

28.8(4) 
134.0(4.2) 
159.4(3.1) 
1.189(20) 



130.7 ±0.1 ±0.36 

159.8 ±1.4 ±0.44 
1.223(12) 



TABLE XL Meson and baryon masses at ttie ptiysical point in physical units, m^, rriK, rnn are inputs. 



channel 



experiment [GeV] [46] 



physical point [GeV] 



w/o FSE 



w/ FSE 



TT 

K 

P 
K* 

'!> 

N 
A 
E 

A 

E* 

n 



0.1350 

0.4976 

0.7755 

0.8960 

1.0195 

0.9396 

1.1157 

1.1926 

1.3148 

1.232 

1.3837 

1.5318 

1.6725 



0.776(34) 

0.896(9) 

1.0084(40) 

0.953(41) 

1.092(20) 

1.156(17) 

1.304(10) 

1.274(39) 

1.430(23) 

1.562(9) 



0.776(34) 

0.896(9) 

1.0084(40) 

0.953(41) 

1.092(20) 

1.156(17) 

1.304(10) 

1.275(39) 

1.430(23) 

1.562(9) 



TABLE XIL ro at each hopping parameter and the physical point. The first error at the physical point is statistical and the 



second and the third 


ones 


are 


the 


systematic 


uncertainties due to the choice of imj 


n and 


^min, 


respectively. 


^ud 










Ks 






ro 



0.13700 
0.13727 
0.13754 
0.13754 
0.13770 
0.13781 



0.13640 
0.13640 
0.13640 
0.13660 
0.13640 
0.13640 



4.813(30)(+40)(±13) 

4.879(38)(+35)(±74) 

5.121(21)(+82)(+9) 

5.276(28)(+85)(+8) 

5.176(23)(±54)(±8) 

5.276(33)(±112)(-3) 



physical point 



5.427(51)(±81)(-2) 



TABLE XIIL PACS-CS and CP-PACS/JLQCD results for ro in lattice units at (Kud,Ks) = (0.13700,0.13640). 
errors are the same as in Table IXllI 


Meaning of 


lattice size ro 


PACS-CS 32^ X 64 4.813(30) (±40) (±13) 
CP-PACS/JLQCD 20^x40 4.741(33)(±323)(±30) 
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FIG. 1: Simulation cost at (Kud,Ks) = (0.13770,0.13640) by DDHMC (blue open circle) and (Kud,Ks) = (0.13781,0.13640) by 
MPDDHMC (blue closed circle) for 10000 trajectories. Solid line indicates the cost estimate of Nf =2 + 1 QCD simulations 
with the HMC algorithm at a = 0.1 fm with L = 3 fm for 100 independent configurations. Vertical line denotes the physical 
point. 
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FIG. 2: Plaquette history (left) and normalized autocorrelation function (right) for (Kud,Ks) = (0.13727,0.13640). Horizontal 
lines in the left denote the average value of the plaquette with one standard deviation error band. 
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FIG. 3: Effective masses for the mesons (left) and the baryons (right) at (Kud,Ks) = (0.13754,0.13640). Horizontal lines 
represent the fitting resuhs with one standard deviation error band. 
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FIG. 4: Same as Fig.|3]for (Kud,Ks) = (0.13754,0.13660). 
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FIG. 5: Same as Fig.|3]for (K^d.Ks) = (0.13770,0.13660). 
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FIG. 6: Same as Fig.|3]for (K^d.Ks) = (0.13781,0.13660). 



32 



0.008 



0.006 



0.004 



0.002 



0.000 



5m 



100 



I 



=0.13754/K=0.13640 
=0.I3754/K=0.13660 
=0.13770/K=0.13640 
=0.13781/K =0.13640 



I 



200 300 400 

binsize [x] 



500 



0.004 



0.003 



0.002 



0.001 



0.000 



5m 



mS 



m K^_,=0.13754/K=0.13640 

■ K^_j=0.13754/K=0.13660 

A K^_j=0.13770/K=0.13640 

* K=0.13781/K =0.13640 

ud s 



\\\ u } :: 



100 



200 300 400 

binsize [x] 



FIG. 7: Binsize dependence of the magnitude of error for m,r (left) and m^^^ (right) at Kud > 0.13754. 
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FIG. 11: SU(3) ChPT fit for ml/m^^^ (left) and 2m|f/(m^7' + m^^^) (right). 



0.10 



0.09 



0.08 



0.07 



0.06 



0.05 



0.01 



AWI 

mud 






• 


K =0.13640 


o 


K =0.13660 


▲ 


SU(3) 


T 


SU(3)-FSE 


* 


SU(3)@ph 


* 


SU(3)-FSEeph 



0.02 



0.03 



O.lOOp 
0.095 - 
0.090 - 
0.085 - 
0.080 - 
0.075 - 



t 



0.070' 1^ 



K 



■>* 



K =0.1 3640 
K =0.1 3660 

▲ SU(3) 

Y SU(3)-FSE 

# SU(3)(aph 

* SU(3)-FSE®ph 



o 



_L 



0.01 



0.02 



AWI 

mud 



FIG. 12: SU(3) ChPT fit for /^ (left) and //,- (right). 
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FIG. 13: Ratio of the NLO contribution to the LO one in the SU(3) ChPT fit for ra\lm^^ (left) and 'lm\l{m'^^ + m^ 
(right). 
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FIG. 14: Ratio of the NLO contribution to the LO one in the SU(3) ChPT fit for /^ (left) and Jk (right). 
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FIG. 16: SU(2) ChPT fit for /^ (left) and fx (right). 
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FIG. 21: Same as Fig. [20]for the octet baryons. 
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FIG. 22: Same as Fig. [20]for the decuplet baryons. 
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FIG. 23: Light hadron spectrum extrapolated to the physical point using m-^, tuk and mQ as input. Horizontal bars denote 
the experimental values. 



1.00 
0.96 
0.92 h 
0.88 

1.30 

1.20 h 
1.10 

2.00 

1.50 

1.00 



V^,.(r=4) 



V,,,(r=8) 



V,,(r=12) 



0123456789 

t 

FIG. 24: Effective potential Veff(r, i) with r = 4, 8, 12 at /tud ~ 0.13770 as a representative case. 



43 



2.0 



1.5 



1.0 



0.5 



V(r) 




FIG. 25: Static quark potential V{r) at K^d = 0.13770 as a representative case. Solid line denote the fit result with Eq. (|79 
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FIG. 26: Linear chiral extrapolation for 1/ro at the physical point. Red triangles denote the fit results at the measured values 

of m^r. 



